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Abstract 

^ Traceroute measurements are one of our main instruments to shed light onto the structure and properties of 

today's complex networks such as the Internet. This paper studies the feasibility and infeasibihty of inferring the 
network topology given traceroute data from a worst-case perspective, i.e., without any probabilistic assumptions 
on, e.g., the nodes' degree distribution. We attend to a scenario where some of the routers are anonymous, 
and propose two fundamental axioms that model two basic assumptions on the traceroute data: (1) each trace 
^ corresponds to a real path in the network, and (2) the routing paths are at most a factor 1 ja off the shortest 

'"^ paths, for some parameter a G (0, 1]. In contrast to existing literature that focuses on the cardinality of the 

0> set of (often only minimal) inferrable topologies, we argue that a large number of possible topologies alone is 

often unproblematic, as long as the networks have a similar structure. We hence seek to characterize the set 
of topologies inferred with our axioms. We introduce the notion of star graphs whose colorings capture the 

|-H differences among inferred topologies; it also allows us to construct inferred topologies exphcitly. We find that 

in general, inferrable topologies can differ significantly in many important aspects, such as the nodes' distances 
x/^ or the number of triangles. These negative results are complemented by a discussion of a scenario where the 

^ trace set is best possible, i.e., "complete". It turns out that while some properties such as the node degrees are 

stiU hard to measure, a complete trace set can help to determine global properties such as the coimectivity. 
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1 Introduction 



Surprisingly little is known about the structure of many important complex networks such as the Internet. One 
reason is the inherent difficulty of performing accurate, large-scale and preferably synchronous measurements 
from a large number of different vantage points. Another reason are privacy and information hiding issues: for 
example, network providers may seek to hide the details of their infrastructure to avoid tailored attacks. 

Since knowledge of the network characteristics is crucial for many applications (e.g., RMTP [il2il . or 
PaDIS ifTSl ). the research community implements measurement tools to analyze at least the main properties of 
the network. The results can then, e.g., be used to design more efficient network protocols in the future. 

This paper focuses on the most basic characteristic of the network: its topology. The classic tool to study 
topological properties is traceroute. Traceroute allows us to collect traces from a given source node to a set of 
specified destination nodes. A trace between two nodes contains a sequence of identifiers describing the route 
traveled by the packet. However, not every node along such a path is configured to answer with its identifier. 
Rather, some nodes may be anonymous in the sense that they appear as stars ('*') in a trace. Anonymous nodes 
exacerbate the exploration of a topology because already a small number of anonymous nodes may increase the 
spectrum of inferrable topologies that correspond to a trace set T. 

This paper is motivated by the observation that the mere number of inferrable topologies alone does not con- 
tradict the usefulness or feasibility of topology inference; if the set of inferrable topologies is homogeneous in the 
sense that that the different topologies share many important properties, the generation of all possible graphs can 
be avoided: an arbitrary representative may characterize the underlying network accurately. Therefore, we identify 
important topological metrics such as diameter or maximal node degree and examine how "close" the possible 
inferred topologies are with respect to these metrics. 

1.1 Related Work 

Arguably one of the most influential measurement studies on the Internet topology was conducted by the Faloutsos 
brothers [8l who show that the Internet exhibits a skewed structure: the nodes' out-degree follows a power-law 
distribution. Moreover, this property seems to be invariant over time. These results complement discoveries of 
similar distributions of communication traffic which is often self-similar, and of the topologies of natural networks 
such as human respiratory systems. This property allows us to give good predictions not only on node degree 
distributions but also, e.g., on the expected number of nodes at a given hop-distance. Since lH was published, 
many additional results have been obtained, e.g., by conducting a distributed computing approach to increase the 
number of measurement points |'6'|. However, our understanding remains preliminary, and the topic continues to 
attract much attention from the scientific communities. In contrast to these measurement studies, we pursue a more 
formal approach, and a complete review of the empirical results obtained over the last years is beyond the scope of 
this paper 

In the field of network tomography, topologies are explored using pairwise end-to-end measurements, without 
the cooperation of nodes along these paths. This approach is quite flexible and applicable in various contexts, e.g., 
in social networks H. For a good discussion of this approach as well as results for a routing model along shortest 
and second shortest paths see iH. For example. El shows that for sparse random graphs, a relatively small number 
of cooperating participants is sufficient to discover a network fairly well. 

The classic tool to discover Internet topologies is traceroute {]}. Unfortunately, there are several problems with 
this approach that render topology inference difficult, such as aliasing or load-balancing, which has motivated 
researchers to develop new tools such as Paris Traceroute ||5l[T0l. Another complication stems from the fact that 
routers may appear as stars in the trace since they are overloaded or since they are configured not to send out any 
ICMP responses. The lack of complete information in the trace set renders the accurate characterization of Internet 
topologies difficult. 

This paper attends to the problem of anonymous nodes and assumes a conservative, "worst-case" perspective 
that does not rely on any assumptions on the underlying network. There are already several works on the subject. 
Yao et al. [.15.1 initiated the study of possible candidate topologies for a given trace set and suggested computing 
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the minimal topology, that is, the topology with the minimal number of anonymous nodes, which turns out to be 
NP-hard. Consequently, different heuristics have been proposed llQUTOl. 

Our work is motivated by a series of papers by Acharya and Gouda. In f3l, a network tracing theory model 
is introduced where nodes are "irregular" in the sense that each node appears in at least one trace with its real 
identifier. In hardness results are derived for this model. However, as pointed out by the authors themselves, 
the irregular node model — where nodes are anonymous due to high loads — is less relevant in practice and hence 
they consider strictly anonymous nodes in their follow-up studies [2]. As proved in [2], the problem is still hard 
(in the sense that there are many minimal networks corresponding to a trace set), even with only two anonymous 
nodes, symmetric routing and without aliasing. 

In contrast to this line of research on cardinalities, we are interested in the network properties. If the inferred 
topologies share the most important characteristics, the negative results in [ 1 , 2| may be of little concern. Moreover, 
we believe that a study limited to minimal topologies only may miss important redundancy aspects of the Internet. 
Unlike |[il|2|^ our work is constructive in the sense that algorithms can be derived to compute inferred topologies. 

1.2 Our Contribution 

This paper initiates the study and characterization of topologies that can be inferred from a given trace set computed 
with the traceroute tool. While existing literature assuming a worst-case perspective has mainly focused on the 
cardinality of minimal topologies, we go one step further and examine specific topological graph properties. 

We introduce a formal theory of topology inference by proposing basic axioms (i.e., assumptions on the trace 
set) that are used to guide the inference process. We present a novel and we believe appealing definition for the 
isomorphism of inferred topologies which is aware of traffic paths; it is motivated by the observation that although 
two topologies look equivalent up to a renaming of anonymous nodes, the same trace set may result in different 
paths. Moreover, we initiate the study of two extremes: in the first scenario, we only require that each link appears 
at least once in the trace set; interestingly, however, it turns out that this is often not sufficient, and we propose a 
"best case" scenario where the trace set is, in some sense, complete: it contains paths between all pairs of nodes. 

The main result of the paper is a negative one. It is shown that already a small number of anonymous nodes 
in the network renders topology inference difficult. In particular, we prove that in general, the possible inferrable 
topologies differ in many crucial aspects. 

We introduce the concept of the star graph of a trace set that is useful for the characterization of inferred 
topologies. In particular, colorings of the star graphs allow us to constructively derive inferred topologies. (Al- 
though the general problem of computing the set of inferrable topologies is related to NP-hard problems such as 
minimal graph coloring and graph isomorphism, some important instances of inferrable topologies can be com- 
puted efficiently.) The minimal coloring (i.e., the chromatic number) of the star graph defines a lower bound on the 
number of anonymous nodes from which the stars in the traces could originate from. And the number of possible 
colorings of the star graph — a function of the chromatic polynomial of the star graph — gives an upper bound on the 
number of inferrable topologies. We show that this bound is tight in the sense that there are situation where there 
indeed exist so many inferrable topologies. Especially, there are problem instances where the cardinality of the set 
of inferrable topologies equals the Bell number. This insight complements (and generalizes to arbitrary, not only 
minimal, inferrable topologies) existing cardinality results. 

Finally, we examine the scenario oi fully explored networks for which "complete" trace sets are available. As 
expected, inferrable topologies are more homogenous and can be characterized well with respect to many properties 
such as node distances. However, we also find that other properties are inherently difficult to estimate. Interestingly, 
our results indicate that full exploration is often useful for global properties (such as connectivity) while it does not 
help much for more local properties (such as node degree). 

1.3 Organization 

The remainder of this paper is organized as follows. Our theory of topology inference is introduced in Section |2] 
The main contribution is presented in Sections [3] and |4] where we derive bounds for general trace sets and fully 
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explored networks, respectively. In Section |5} the paper concludes with a discussion of our results and directions 
for future research. Due to space constraints, some proofs are moved to the appendix. 

2 Model 

Let T denote the set of traces obtained from probing (e.g., by traceroute) a (not necessarily connected and undi- 
rected) network Go = {Vo,Eq) with nodes or vertices Vq (the set of routers) and links or edges Eq. We assume 
that Go is static during the probing time (or that probing is instantaneous). Each trace T(n, v) £ T describes a 
path connecting two nodes u,v G Vq; when u and v do not matter or are clear from the context, we simply write 
T. Moreover, let dxiu, v) denote the distance (number of hops) between two nodes u and v in trace T. We define 
dcoiu, v) to be the corresponding shortest path distance in Gq. Note that a trace between two nodes u and v may 
not describe the shortest path between u and t; in Gq. 

The nodes in Vq fall into two categories: anonymous nodes and non-anonymous (or shorter: named) nodes. 
Therefore, each trace T describes a sequence of symbols representing anonymous and non-anonymous nodes. 
We make the natural assumption that the first and the last node in each trace T is non-anonymous. Moreover, we 
assume that traces are given in a form where non-anonymous nodes appear with a unique, anti-aliased identifier 
(i.e., the multiple IP addresses corresponding to different interfaces of a node are resolved to one identifier); an 
anonymous node is represented as * ("star") in the traces. For our formal analysis, we assign to each star in a trace 
set T a unique identifier i: *j. (Note that except for the numbering of the stars, we allow identical copies of T in 
T, and we do not make any assumptions on the implications of identical traces: they may or may not describe the 
same paths.) Thus, a trace T G T is a sequence of symbols taken from an alphabet S = TV U ((J ■ *,), where IV 
is the set of non-anonymous node identifiers (IDs): S is the union of the (anti-aliased) non-anonymous nodes and 
the set of all stars (with their unique identifiers) appearing in a trace set. The main challenge in topology inference 
is to determine which stars in the traces may originate from which anonymous nodes. 

Henceforth, let n = |XD| denote the number of non-anonymous nodes and let s = ||Ji *i| be the number of 
stars in T; similarly, let a denote the number of anonymous nodes in a topology. Let A^ = n + s = |S|be the total 
number of symbols occurring in T. 

Clearly, the process of topology inference depends on the assumptions on the measurements. In the following, 
we postulate the fundamental axioms that guide the reconstruction. First, we make the assumption that each link of 
Go is visited by the measurement process, i.e., it appears as a transition in the trace set T. In other words, we are 
only interested in inferring the (sub-)graph for which measurement data is available. 

Axiom (Complete Cover): Each edge of Go appears at least once in some trace in T. 

The next fundamental axiom assumes that traces always represent paths on Go- 

Axiom 1 (Reality Sampling): For every trace T G T, if the distance between two symbols 0"i,cr2 G T 
is dxifTi, (T2) = k, then there exists a path (i.e., a walk without cycles) of length k connecting two (named or 
anonymous) nodes cji and a2 in Go- 

The following axiom captures the consistency of the routing protocol on which the traceroute probing relies. In 
the current Internet, policy routing is known to have in impact both on the route length [14.1 and on the convergence 
time ifTTI . 

Axiom 2 (a-( Routing) Consistency): There exists an a G (0, 1] such that, for every trace T G T, if 
(iT(ci,o"2) = k for two entries (Ji,a2 in trace T, then the shortest path connecting the two (named or anony- 
mous) nodes coiTcsponding to ai and a2 in Gq has distance at least [afc] . 
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Note that if a = 1, the routing is a shortest path routing. Moreover, note that if a = 0, there can be loops in 
the paths, and there are hardly any topological constraints, rendering almost any topology inferrable. (For example, 
the complete graph with one anonymous router is always a solution.) 

A natural axiom to merge traces is the following. 

Axiom 3 (Trace Merging): For two traces Ti,T2 ^ T for which 3ai , (T2 , CT3, where (T2 refers to a named node, 
such that (cri, (T2) = i and (c2, fa) = j, it holds that the distance between two nodes u and v corresponding 
to (Ti and CJ2, respectively, in Go, is at most dc'oC'^i; ^s) ^ ^ + i- 

Any topology G which is consistent with these axioms (when applied to T) is called inferrable from T. 

Definition 2.1 (Inferrable Topologies). A topology G is (a-consistently) inferrable /rom a trace set T if axioms 
Axiom 0, Axiom i, Axiom 2 (with parameter a), and Axiom 3 are fulfilled. 

We will refer by ^7- to the set of topologies inferrable from T ■ Please note the following important observation. 

Remark 2.2. While we generally have that Gq G G-j-, since T was generated from Gq and Axiom 0, Axiom 1, 
Axiom 2 and Axiom 3 are fulfilled by definition, there can be situations where an a-consistent trace set for 
Gq contradicts Axiom 0: some edges may not appear in T. If this is the case, we will focus on the inferrable 
topologies containing the links we know, even if Gq may have additional, hidden links that cannot be explored due 
to the high a value. 

The main objective of a topology inference algorithm Alg is to compute topologies which are consistent with 
these axioms. Concretely, Alg's input is the trace set T together with the parameter a specifying the assumed 
routing consistency. Essentially, the goal of any topology inference algorithm Alg is to compute a mapping of 
the symbols S (appearing in T) to nodes in an inferred topology G; or, in case the input parameters a and T are 
contradictory, reject the input. This mapping of symbols to nodes implicitly describes the edge set of G as well: 
the edge set is unique as all the transitions of the traces in T are now unambiguously tied to two nodes. 

So far, we have ignored an important and non-trivial ques- ♦♦*'•♦ 
tion: When are two topologies Gi,G2 G Qt different (and 
hence appear as two independent topologies in QtV- In this pa- 
per, we pursue the following approach: We are not interested in 
purely topological isomorphisms, but we care about the identi- 
fiers of the non-anonymous nodes, i.e., we are interested in the 
locations of the non-anonymous nodes and their distance to other 
nodes. For anonymous nodes, the situation is slightly more com- *♦>••♦**** 
plicated: one might think that as the nodes are anonymous, their **** 

"names" do not matter. Consider however the example in Fig- 

E,. ^ • ui * 1 ■ u * A Figure 1: Two non-isomorphic inferred 

the two mferrable topologies have two anonymous nodes, ^ ^ 

. r T 1 r T A - ^ A u topologies, i.e., different mapping functions 

once where 1*1, *2j plus 1*3, *4j are merged mto one node each f & ' ' ^ ff t, 

in the inferrable topology and once where {*i, *4} plus {*2, *3} ^^^^ these topologies, 
are merged into one node each in the inferrable topology. In this paper, we regard the two topologies as different, 
for the following reason: Assume that there are two paths in the network, one u *2 v (c-g-, during day time) 
and one u *3 v (e.g., at night); clearly, this traffic has different consequences and hence we want to be able 
to distinguish between the two topologies described above. In other words, our notion of isomorphism of inferred 
topologies is path-aware. 

It is convenient to introduce the following MAP function. Essentially, an inference algorithm computes such a 
mapping. 

Definition 2.3 (Mapping Function Map). Let G = {V,E) G Qj- be a topology inferrable from T. A topology 
inference algorithm describes a surjective mapping function MAP : S — )• y. For the set of non-anonymous nodes 
in S, the mapping function is bijective; and each star is mapped to exactly one node in V, but multiple stars may be 
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assigned to the same node. Note that for any cr G S, Map((t) uniquely identifies a node v E V. More specifically, 

we assume that Map assigns labels to the nodes in V: in case of a named node, the label is simply the node's 
identifier; in case of anonymous nodes, the label is where /3 is the concatenation of the sorted indices of the 
stars which are merged into node *i3. 

With this definition, two topologies Gi , G2 € Q-j- differ if and only if they do not describe the identical (MAP-) 
labeled topology. We will use this Map function also for Go, i.e., we will write MAP(cr) to refer to a symbol cr's 
corresponding node in Gq. 

In the remainder of this paper, we will often assume that Axiom is given. Moreover, note that Axiom 3 
is redundant. Therefore, in our proofs, we will not explicitly cover Axiom 0, and it is sufficient to show that 
Axiom 1 holds to prove that Axiom 3 is satisfied. 

Lemma 2.4. Axiom 1 implies Axiom 3. 

Proof Let The a trace set, and G G Gr- Let fJi, c72, 0-3 s.t. 3Ti,T2 G T with ai G Ti, 0-3 G T2 and a2 G Ti n 
Let i = (iTi(<7i, (72) and j = (iT2(c"i) ca)- Since any inferrable topology G fulfills Axiom 1, there is a path vri of 
length at most i between the nodes corresponding to cti and (T2 in G and a path tt2 of length at most j between the 
nodes corresponding to (72 and as in G. The combined path can only be shorter, and hence the claim follows. □ 

3 Inferrable Topologies 

What insights can be obtained from topology inference with minimal assumptions, i.e., with our axioms? Or what 
is the structure of the inferrable topology set ^7-? We first make some general observations and then examine 
different graph metrics in more detail. 

3.1 Basic Observations 

Although the generation of the entire topology set Qj- may be computationally hard, some instances of Qj- can be 
computed efficiently. The simplest possible inferrable topology is the so-called canonic graph Gc'- the topology 
which assumes that all stars in the traces refer to different anonymous nodes. In other words, if a trace set T 
contains n = \1T>\ named nodes and s stars, Gq will contain |y(Gc)| = N = n + s nodes. 

Definition 3.1 (Canonic Graph Gc). The canonic graph is defined by Gc{Vc,Ec) where Vc = is the set 

of (anti-aliased) nodes appearing in T (where each star is considered a unique anonymous node) and where 
{<7i, (72} G Ec 3r G T, T =(..., (Ti, C72, . . .), i.e., ai follows after 02 in some trace T (ai,a2 G T can be 
either non-anonymous nodes or stars). Let 02) denote the canonic distance between two nodes, i.e., the 

length of a shortest path in Gc between the nodes ai and 02- 

Note that Gc is indeed an inferrable topology. In this case. Map : S — > S is the identity function. The proof 
appears in the appendix. 

Tlieorem 3.2. Gc is inferrable from T. 

Gc can be computed efficiently from T: represent each non-anonymous node and star as a separate node, and 
for any pair of consecutive entries (i.e., nodes) in a trace, add the corresponding link. The time complexity of this 
construction is linear in the size of T. 

With the definition of the canonic graph, we can derive the following lemma which establishes a necessary 
condition when two stars cannot represent the same node in Go from constraints on the routing paths. This is 
useful for the characterization of inferred topologies. 

Lemma 3.3. Let *i, *2 be two stars occurring in some traces in T. *i, *2 cannot be mapped to the same node, i.e., 
Map(*i) / Map(*2), without violating the axioms in the following conflict situations: 
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(i) if *i G Ti and *2 G T2, and Ti describes a too long path between anonymous node Map(*i) and non- 
anonymous node u, i.e., \a ■ (*i, u)~\ > dc{u, *2). 

(ii) if*i G Ti and *2 G T2, and there exists a trace T that contains a path between two non-anonymous nodes u 
and V and \a ■ dxiu, v)~\ > dc{u, *i) + dciv, *2)- 

Proof. The first proof is by contradiction. Assume Map(*i) = Map(*2) represents the same node v of Go, and 
that \a ■ dTi{v,u)~\ > dc{u,v). Then we know from Axiom 2 that dc{v,u) > dGo{v,u) > \a • dTi{u,v)'] > 
dc{v, u), which yields the desired contradiction. 

Similarly for the second proof. Assume for the sake of contradiction that Map(*i) = Map(*2) represents 
the same node w of Go, and that \a • driu, v)] > dc{u, w) + dc{v, w). Due to the triangle inequality, we have 
that dc{u, w) + dc{v, w) > dc{u, v) and hence, \a ■ driu, v)] > dc{u, v), which contradicts the fact that Gc is 
inferrable (Theorem|3.2|). □ 



Lemma 3.3 can be applied to show that a topology is not inferrable from a given trace set because it merges 
(i.e., maps to the same node) two stars in a manner that violates the axioms. Let us introduce a useful concept for 
our analysis: the star graph that describes the conflicts between stars. 

Definition 3.4 (Star Graph G=k). The star graph G=k(Ki<, E^) consists of vertices representing stars in traces, i.e.. 



= IJ ■ *j. Two vertices are connected if and only if they must differ according to Lemma 3.3 i.e., *2} G 
if and only if at least one of the conditions of Lemma \3.3\ holdfor *i, *2- 

Note that the star graph G* is unique and can be computed efficiently for a given trace set T: Conditions (i) 
and (ii) can be checked by computing Gc- However, note that while G* specifies some stars which cannot be 



merged, the construction is not sufficient: as Lemma 3.3 is based on Gc, additional links might be needed to 



characterize the set of inferrable and a-consistent topologies Qq- exactly. In other words, a topology G obtained by 
merging stars that are adjacent in G* is never inferrable (G QtY however, merging non-adjacent stars does not 
guarantee that the resulting topology is inferrable. 

What do star graphs look like? The answer is arbitrarily: the following lemma states that the set of possible star 
graphs is equivalent to the class of general graphs. This claim holds for any a. The proof appears in the appendix. 

Lemma 3.5. For any graph G = {V, E), there exists a trace set T such that G is the star graph for T. 

The problem of computing inferrable topologies is related to the vertex colorings of the star graphs. We will 
use the following definition which relates a vertex coloring of G=k to an inferrable topology G by contracting 
independent stars in G* to become one anonymous node in G. For example, observe that a maximum coloring 
treating every star in the trace as a separate anonymous node describes the inferrable topology Gq- 

Definition 3.6 (Coloring-Induced Graph). Let 7 denote a coloring of G=k which assigns colors 1, . . . , /c to the 

vertices of G^: 7 : Ki, — )• {1, . . . , fc}. We require that 7 is a proper coloring of G-t, i.e., that different anonymous 
nodes are assigned different colors: {u, f } G i?* =^ 7^ 7(^)- G-y is defined as the topology induced by 7. G-y 
describes the graph Gc where nodes of the same color are contracted: two vertices u and v represent the same 
node in G^, i.e., MAP(*j) = MAP(*j), if and only if "f{*i) = 7(*j)- 

The following two lemmas establish an intriguing relationship between colorings of G* and inferrable topolo- 
gies. Also note that Definition |3.6| implies that two different colorings of G* define two non-isomorphic inferrable 
topologies. 

We first show that while a coloring-induced topology always fulfills Axiom 1, the routing consistency is 
sacrificed. The proof appears in the appendix. 

Lemma 3.7. Let ^ be a proper coloring ofG^,. The coloring induced topology G^ is a topology fulfilling Axiom 2 
with a routing consistency of a', for some positive a'. 

An inferrable topology always defines a proper coloring on G* . 
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Lemma 3.8. Let T be a trace set and G^, its corresponding star graph. If a topology G is inferrable from T, then 
G induces a proper coloring on G*. 

Proof. For any a-consistent inferrable topology G there exists some mapping function MAP that assigns each 



symbol of T to a corresponding node in G (cf Definition 2.3 1, and this mapping function gives a coloring on G^, 
(i.e., merged stars appear as nodes of the same color in G*). The coloring must be proper: due to Lemma 3.3 an 



inferrable topology can never merge adjacent nodes of G*. □ 
The colorings of G* allow us to derive an upper bound on the cardinality of Qq-. 

Theorem 3.9. Given a trace set T sampled from a network Go and Q-y, the set of topologies inferrable from T, it 
holds that: 

IK I 

PiG.,k)/kl>\gr\, 

where 7(G=„) is the chromatic number of G^ and P{G^:, k) is the number of colorings ofG^: with k colors (known 
as the chromatic polynomial of G*). 



Proof. The proof follows directly from Lemma 3.8 which shows that each inferred topology has proper colorings, 



and the fact that a coloring of G* cannot result in two different inferred topologies, as the coloring uniquely 



describes which stars to merge (Lemma 3.7 1. In order to account for isomorphic colorings, we need to divide by 



the number of color permutations. □ 



Note that the fact that G* can be an arbitrary graph (Lemma 3.5 1 implies that we cannot exploit some special 
properties of G* to compute colorings of G* and 7(G*). Also note that the exact computation of the upper bound 
is hard, since the minimal coloring as well as the chromatic polynomial of G* (in P[J) is needed. To complement 
the upper bound, we note that star graphs with a small number of conflict edges can indeed result in a large number 
of inferred topologies. 

Theorem 3.10. For any q > 0, there is a trace set for which the number of non-isomorphic colorings ofG.^ equals 
\Qt\' if^ particular \Qt\ = Bs, where Qj- is the set of inferrable and a-consistent topologies, s is the number of 
stars in T, and Bs is the Bell number of s. Such a trace set can originate from a Gq network with one anonymous 
node only. 

Proof. Consider a trace set T = {{cri,*i,(^i)i=i,....s} (e.g., obtained from exploring a topology Gq where one 
anonymous center node is connected to 2s named nodes). The trace set does not impose any constraints on how the 
stars relate to each other, and hence, G* does not contain any edges at all; even when stars are merged, there are no 
constraints on how the stars relate to each other. Therefore, the star graph for T has Bg = Y^j=o '^{s,j) colorings, 

where 5'(sj) = 1/j! • '^e=o(~^y ii) ~ number of ways to group s nodes into j different, disjoint 

non-empty subsets (known as the Stirling number of the second kind). Each of these colorings also describes a 
distinct inferrable topology as Map assigns unique labels to anonymous nodes stemming from merging a group of 
stars (cf Definition|2.3[). □ 



3.2 Properties 

Even if the number of inferrable topologies is large, topology inference can still be useful if one is mainly interested 
in the properties of Gq and if the ensemble Qq- is homogenous with respect to these properties; for example, if 
"most" of the instances in Qq- are close to Gq, there may be an option to conduct an efficient sampling analysis 
on random representatives. Therefore, in the following, we will take a closer look how much the members of Qq- 
differ. 

Important metrics to characterize inferrable topologies are, for instance, the graph size, the diameter Diam(-), 
the number of triangles G3(-) of G, and so on. In the following, let Gi = (Vi, £'i), G2 = (V2, E2) G Qt be two 
arbitrary representatives of Qq-. 

As one might expect, the graph size can be estimated quite well. 
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Lemma 3.11. It holds that \Vi\ - \V2\ < s--/{G^) < s - 1 and \Vi\/\V2\ < (n + s)/ {n + -/{G^)) < (2 + s)/3. 
Moreover, \Ei\ — \E2\ < 2{s — 7(G*)) and |£'i|/|£'2| < (^^ + 2s)/(i^ + 2) < s, where v denotes the number of 
edges between non-anonymous nodes. There are traces with inferrable topology G\ , G2 reaching these bounds. 

Observe that inferrable topologies can also differ in the number of connected components. This implies that 
the shortest distance between two named nodes can differ arbitrarily between two representatives in Q-j-. 

Lemma 3.12. Let Comp(G) denote the number of connected components of a topology G. Then, |COMP(Gi) — 
C0MP(G2)| < n/2. There are instances Gi, G2 that reach this bound. 

Proof. Consider the trace set T = {Tj, i = I . . . [n/2j } in which Tj = {n2i, *i, n2i+i}. Since i j ^ TiCiTj = 
%, we have =0. Take Gi as the 1-coloring of G*: Gi is a topology with one anonymous node connected to 
all named nodes. Take G2 as the [n/2 J -coloring of the star graph: G2 has [n/2 J distinct connected components 
(consisting of three nodes). 

Upper bound: For the sake of contradiction, suppose 3T s.t. |COMP(Gi) — C0MP(G2)| > \n/2\. Let us 
assume that Gi has the most connected components: Gi has at least [n/2j +1 more connected components than 
G2. Let C refer to a connected component of G2 whose nodes are not connected in Gi. This means that C contains 
at least one anonymous node. Thus, C contains at least two named nodes (since a trace T cannot start or end 
by a star). There must exist at least [n/2j + 1 such connected component G. Thus G2 has to contain at least 
2([n/2j+l)>n + l named nodes. Contradiction. □ 

An important criterion for topology inference regards the distortion of shortest paths. 

Definition 3.13 (Stretch). The maximal ratio of the distance of two non-anonymous nodes in Gq and a connected 
topology G is called the stretch p: p = max„ j,gj-p(Go) niax{dG'(j(ti, v)/dG{u, v), dciu, v)/dGoiu, v)}. 



From Lemma 3.12 we already know that inferrable topologies can differ in the number of connected com- 
ponents, and hence, the distance and the stretch between nodes can be arbitrarily wrong. Hence, in the 
following, we will focus on connected graphs only. However, even if two nodes are connected, their dis- 
tance can be much longer or shorter than in Gq. Figure [2] gives an example. Both topologies are in- 
ferrable from the traces Ti = {v,*,vi, . . . , Vk, u) and T2 = {w, *, wi, . . . , Wk, u). One inferrable topology 
is the canonic graph Gc (Figure [2] left), whereas the other topology merges the two anonymous nodes (Fig- 
ure [2] ng/zO. The distances between v and w are 2{k + 2) and 2, respectively, implying a stretch of A; + 2. 

Lemma 3.14. Let u and v be two arbitrary named nodes in the ^ /Cy}- 

connected topologies Gi and G2. Then, even for only two stars 
in the trace set, it holds for the stretch that p < (N — l) /2. There 
are instances Gi, G2 that reach this bound. 

We now turn our attention to the diameter and the degree. 

Figure 2: Due to the lack of a trace between v 
Lemma 3.15. For connected topologies Gi,G2 it holds that and it;, the stretch of an inferred topology can 

DiAM(Gi) - Diam(G2) < (s - l)/s ■ Diam(Gc) < {s - be large. 
1)(A^ - l)/s a«(iDlAM(Gi)/DlAM(G2) < s, where YilAM de- 
notes the graph diameter and T)lAM[Gi) > Diam(G2). There 
are instances Gi, G2 that reach these bounds. 

Proof. Upper bound: As Gc does not merge any stars, it describes the network with the largest diameter. Let vr be 
a longest path between two nodes u and v in Gc- In the extreme case, vr is the only path determining the network 
diameter and vr contains all star nodes. Then, the graph where all s stars are merged into one anonymous node has 
a minimal diameter of at least Diam(Gc)/ s. 

Example meeting the bound: Consider the trace set T = {(^ti, . . . , *i, . . . , U2), {u2, . . . ,*2, ■ ■ ■ , . . . , 
{us, ■ ■ ■ ,*s, ■ ■ ■ , Us+i)} with X named nodes and star in the middle between Ui and Uj+i (assume x to be even, x 
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does not include Uj and Uj+i ). It holds that Diam(G'c) = s • (x + 2) whereas in a graph G where all stars are 
merged, Diam(G') = x + 2. There are n = s{x + 1) non-anonymous nodes, so x = (n — s — l)/s. Figure [5] 
depicts an example. □ 

Lemma 3.16. For the maximal node degree Deg, we have Deg(Gi) — Deg(G2) < 2(s — 7(G*)) and 
Deg(Gi)/Deg(G2) < s — 7(G*) + 1. There are instances Gi, G2 that reach these bounds. 



G: 



Another important topology measure that indicates ^ 
how well meshed the network is, is the number of tri- ^ x/2 „ x/2 „ x/a ^ x/2 ^ ^ — ^ — ^ 

° " x/2 I 

Lemma 3.17. Let Cz{G) be the number of cycles of ^ 
length 3 of the graph G. It holds that C^{Gi) — Figure 3: Estimation error for diameter. 

C^{G2) < 2s{s — 1), which can be reached. The rela- 
tive error C'i{Gi)/ G^{G2) can be arbitrarily large un- 
less the number of links between non-anonymous nodes exceeds in which case the ratio is upper bounded by 
2s{s - 1) + 1. 

Proof. Upper bound: Each node which is part of a triangle has at least two incident edges. Thus, a node v 
can be part of at most (^^2^^^) triangles, where DEG(t;) denotes u's degree. As a consequence the number of 
triangles containing an anonymous node in an inferrable topology with a anonymous nodes ui , . . . is at most 
YTj=i Given s, this sum is maximized if a = 1 and Deg(ui) = 2s as 2s is the maximum degree 



possible due to Lemma 3.16 Thus there can be at most s ■ (2s — 1) triangles containing an anonymous node in Gi. 
The number of triangles with at least one anonymous node is minimized in Gc because in the canonic graph the 
degrees of the anonymous nodes are minimized, i.e, they are always exactly two. As a consequence there cannot 
be more than s such triangles in Go- 
lf the number of such triangles in Gc is smaller by x, then the number of of triangles with at least one anony- 
mous node in the topology Gi is upper bounded by s • (2s — 1) — x. The difference between the triangles in Gi 
and G2 is thus at most s(2s — 1) — x — s + x = 2s(s — 1). 

Example meeting this bound: If the non-anonymous nodes form a complete graph and all star nodes can be 
merged into one node in Gi and G2 = Gc, then the difference in the number of triangles matches the upper bound. 
Consequently it holds for the ratio of triangles with anonymous nodes that it does not exceed (s(2s— 1) — x) /(s— x) . 
Thus the ratio can be infinite, as x can reach s. However, if the number of links between n non-anonymous nodes 
exceeds then there is at least one triangle, as the densest complete bipartite graph contains at most 
hnks. □ 



4 Full Exploration 

So far, we assumed that the trace set T contains each node and link of Go at least once. At first sight, this seems 
to be the best we can hope for. However, sometimes traces exploring the vicinity of anonymous nodes in different 
ways yields additional information that help to characterize Qq- better. 

This section introduces the concept of fully explored networks: T contains sufficiently many traces such that 
the distances between non-anonymous nodes can be estimated accurately. 

Definition 4.1 (Fully Explored Topologies). A topology Go is fully explored by a trace set T if it contains all nodes 
and links ofG^ and for each pair {n, v\ of non- anonymous nodes in the same component ofG^ there exists a trace 
T containing both nodes u and v €z T. 

In some sense, a trace set for a fully explored network is the best we can hope for. Properties that cannot be 
inferred well under the fully explored topology model are infeasible to infer without additional assumptions on Go- 
In this sense, this section provides upper bounds on what can be learned from topology inference. In the following, 
we will constrain ourselves to routing along shortest paths only (a = 1). 
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Let us again study the properties of the family of inferrable topologies fully explored by a trace set. Obviously, 
all the upper bounds from Section[3]are still valid for fully explored topologies. In the following, let Gi, G2 G Qt 



be arbitrary representatives of Qj- for a fully explored trace set T. A direct consequence of the Definition 4.1 
concerns the number of connected components and the stretch. (Recall that the stretch is defined with respect to 
named nodes only, and since a = 1, a 1 -consistent inferrable topology cannot include a shorter path between u and 

V than the one that must appear in a trace of T.) 

Lemma 4.2. It holds that COMP(Gi) = C0MP(G2) (= Comf{Go)) and the stretch is 1. 

The proof for the claims of the following lemmata are analogous to our former proofs, as the main difference 
is the fact that there might be more conflicts, i.e., edges in G^,. 

Lemma 4.3. For fully explored networks it holds that \Vi\ — IV2I < s — 7(G*) < s — 1 and |Vi|/|V2| < (ra + 
s)/(n + 7(G=,)) < (2 + s)/3. Moreover, |^i|-|S2| G 2(5-7(0*)) and\Ei\/\E2\ < {iy + 2s)/{u + 2) < s, where 

V denotes the number of links between non-anonymous nodes. There are traces with inferrable topology G\ , G2 
reaching these bounds. 

Lemma 4.4. For the maximal node degree, we have Deg(G'i) — Deg(G2) < 2(s — 7(G*)) and 
Deg(Gi)/Deg(G2) < s — 7(G=k) + 1. There are instances Gi, G2 that reach these bounds. 



From Lemma [4^2] we know that fully explored scenarios yield a perfect stretch of one. However, regarding the 
diameter, the situation is different in the sense that distances between anonymous nodes play a role. 

Lemma 4.5. For connected topologies Gi, G2 it holds that Diam(Gi)/Diam(G2) < 2, where DiAM denotes the 
graph diameter and Diam(Gi) > Diam(G2). There are instances Gi, G2 that reach this bound. Moreover, there 
are instances with Diam(Gi) — Diam(G2) = s/2. 

The number of triangles with anonymous nodes can still not be estimated accurately in the fully explored 
scenario. 

Lemma 4.6. There exist graphs where Cj,{G\) — C-i{G2) = s{s — l)/2, and the relative error C^{Gi) /C^{G2) 
can be arbitrarily large. 



5 Conclusion 

We understand our work as a first step to shed light onto the similarity of inferrable topologies based on most basic 
axioms and without any assumptions on power-law properties, i.e., in the worst case. Using our formal framework 
we show that the topologies for a given trace set may differ significantly. Thus, it is impossible to accurately 
characterize topological properties of complex networks. To complement the general analysis, we propose the 
notion of fully explored networks or trace sets, as a "best possible scenario". As expected, we find that fully 
exploring traces allow us to determine several properties of the network more accurately; however, it also turns out 
that even in this scenario, other topological properties are inherently hard to compute. Our results are summarized 
in Figure [4] 

Our work opens several directions for future research. On a theoretical side, one may study whether the minimal 
inferrable topologies considered in, e.g., |[T] |2l, are more similar in nature. More importantly, while this paper 
presented results for the general worst-case, it would be interesting to devise algorithms that compute, for a given 
trace set, worst-case bounds for the properties under consideration. For example, such approximate bounds would 
be helpful to decide whether additional measurements are needed. Moreover, maybe such algorithms may even 
give advice on the locations at which such measurements would be most useful. 
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Property/Scenario 


Arbitrary 


Fully Explored (q = 1) 




Gi — G2 


G1/G2 


Gi — G2 


G1/G2 


# of nodes 


< s-7(G,) 


< (n + s)/(n + 7(G*)) 


< s-7(G.) 


< (n + s)/(n + 7(G*)) 


# of links 


<2(s-7(G.)) 


< (v + 2s)/{u + 2) 


<2(s-7(G.)) 


< (i/ + 2s)/(i/ + 2) 


# of connected components 


< n/2 


< n/2 


= 


= 1 


Stretch 




< (N-l)/2 




= 1 


Diameter 


< (s - l)/s • (iV - 1) 


< s 


s/2 (1) 


2 


Max. Deg. 


<2(s-7(G.)) 


< s-7(G.) + l 


<2(s-7(G.)) 


< s-7(G*) + l 


Triangles 


< 2s(s- 1) 


00 


< 2s(s- l)/2 


00 



Figure 4: Summary of our bounds on the properties of inferrable topologies, s denotes the number of stars 
in the traces, n is the number of named nodes, N = n + s, and v denotes the number of links between named 
nodes. Note that trace sets meeting these bounds exist for all properties for which we have tight or upper 
bounds. For the two entries marked with {%), only "lower bounds" are derived, i.e., examples that yield at 
least the corresponding accuracy; as the upper bounds from the arbitrary scenario do not match, how to 
close the gap remains an open question. 
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A Deferred Proofs 
A.1 Proof of Theorem |32] 

Fix T. We have to prove that Gc fulfills Axiom 0, Axiom 1 (which implies Axiom 3) and Axiom 2. 
Axiom 0: The axiom holds trivially: only edges from the traces are used in Gc- 

Axiom 1: Let T eT and ai,a2 G T. Let k = dT{o-i,a2). We show that Gc fulfills Axiom 1, namely, there 
exists a path of length k in Gc. Induction on k: (k = I:) By the definition of Gc, {^i, (T2} £ -E-c thus there exists 
a path of length one between ai and cj2. (k > 1:) Suppose Axiom 1 holds up to A; — 1. Let a[, . . . , cr^_x 
intermediary nodes between cji and (J2 in T: T = {. . . ,ai,a'i, . . . , <j'j^_i, o"2, . . .). By the induction hypothesis, in 
Gc there is a path of length k — 1 between ai and Let vr be this path. By definition of Gc, Wk-n ^2} G -Be- 
Thus appending (cr[,_i, o'2) to vr yields the desired path of length k linking ai and (J2: Axiom 1 thus holds up to k. 

Axiom 2: We have to show that (iy((Ji, (J2) = k ^ dc{(Ti, CT2) > [a • /c] . By contradiction, suppose that Gc 
does not fulfill Axiom 2 with respect to a. So there exists k' < \a- k] and fxi, (T2 S Vc such that (ic(o"i, (T2) = k'. 
Let TT be a shortest path between ai and a2 in Gc. Let (Ti, . . . , T^) be the corresponding (maybe repeating) traces 
covering this path vr in Gc. Let Tj G (Ti, . . . , T^), and let Si and Cj be the corresponding start and end nodes of n 
in Ti. We will show that this path vr implies the existence of a path in Go which violates a-consistency. Since Go is 
inferrable, Gq fulfills Axiom 2, thus we have: dc{cri,a2) = J2i=i dxiisi, Cj) = k' < \a- k] < dc^icri, (^2) since 
Gq is a-consistent. However, Gq also fulfills Axiom 1, thus dx^isi, Cj) > dcoisi, ^i)- Thus Yli=i dcoi^i^ ^i) ^ 
Yl^i=idTi{si-,e.i) < dGo(o"i, (J2): we have constructed a path from cji to cj2 in Go whose length is shorter than the 
distance between ai and (T2 in Gq, leading to the desired contradiction. 



A.2 Proof of Lemma 13.51 

First we construct a topology Go = {Vq, Eq) and then describe a trace set on this graph that generates the star 
graph G = {V,E). The node set Vq consists of \V\ anonymous nodes and \V\ • (1 + r) named nodes, where 
r = [3/ (2a) — 1/2] . The first building block of Gq is a copy of G. To each node Vi in the copy of G we add 
a chain consisting of 2 + r nodes, first appending r non-anonymous nodes fc) where 1 < A; < r, followed 
by an anonymous node Ui and finally a named node ,-+1). More formally we can describe the link set as 

Eo = EU Ul=i ({i'i,^^(i,i)}'{^(i,i)'^{i,2)}, • • ■Aw{i,r),Ui},{ui,w^i^r+i)})- The trace set T consists of the 
following \ V\ + \E\ shortest path traces: the traces Ti for £ E {1, ... , \V\}, are given by T£(w(^^t-), W(^^t-_(_i)) (for 
each node in V), and the traces T^ for £ £ {\V\ + I, . . . ,\V\ + \E\}, are given by Ti{w(^i^^-^,W(^j^^^) for each 
link {vi,Vj} in E. Note that Gq = Gc as each star appears as a separate anonymous node. The star graph G* 
corresponding to this trace set contains the \ V\ nodes *j (corresponding to Ui). In order to prove the claim of the 



lemma we have to show that two nodes *j, *j are conflicting according to Lemma 3.3 if and only if there is a link 
{vi, Vj} in E. Case (i) does not apply because the minimum distance between any two nodes in the canonic graph 
is at least one, and [a • dj-- t-))] = land [a • 'U^(j,r+i))l = 1- It remains to examine Case {ii): "=^" 

if MAP(*j) = MAP(*j) there would be a path of length two between t-) and tt'Q^r) in the topology generated 
by Map; the trace set however contains a trace r£(i(;(j ,-), uiq- ,-)) of length 2r + 1. So \a ■ dTf {w^i^T -),W (j^T-))] 



\a ■ (2r + 1)] = \a ■ (2[3/(2a) — 1/2] +1]) > 3, which violates the a-consistency (Lemma 3.3 (ii)) and 
hence {*i,*j} e i?* and {vi,Vj} e E. "<^": if {vi,Vj} E, there is no trace T(tt;(j ,-), tf(j_^)), thus we have 
to prove that no trace Ti{w(^ii ^^•^,W(^ji ^^-^) with i' ^ i and j' 7^ j and j' / i leads to a conflict between *j and 
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*j. We show that an even more general statement is true, namely that for any pair of distinct non-anonymous 
nodes xi,X2, where xi,X2 G {f j/, Wj', ^(j/ fc), fc)|l < k < t + / / / j}, it holds that 
\a • dc{xi, X2)~\ < dc{xi, *i) + dc{x2, Since Gc = Gq and the traces contain shortest paths only, the trace 
distance between two nodes in the same trace is the same as the distance in Gc- The following tables contain the 
relevant lower bounds on distances in Gc and /u(xi, X2) = dc{xi, *i) + dc{x2, 





Vi' 








Vi' 





1 


kl 


ki + l 


Vf 


1 





kl + 1 


kl 


W{i',k2) 


k2 


k2 + l 


\k2 — ki\ 


ki + l + k2 




k2 + l 


k2 


ki + l + k2 


\k2-ki\ 


*i 


r + 2 


r+1 


2 + T + ki 


T-ki + l 


*j 


r + 2 


r + 2 


2 + T + ki 


2 + T + ki 


Mt) > 


Vi' 


Vji 






Vi' 


2t + 4 


2r + 3 


A + 2T + ki 


i + 2T + ki 


Vj' 


2t + 3 


2r + 4 


2r + 3 + A;i 


3 + 2r + /ci 


W{i',k2) 


4 + 2r + A;2 


4 + 2r + A;2 


4 + 2r + fci + A;2 


4 + 2t + fci + A;2 


W(j',k2) 


2T-k2 + 3 


2r - A;2 + 3 


2t + 3 + fci - A;2 


2T + ki-k2 + 3 



Table 1: Proof of Lemma 3.5 lower bounds for the distances in Gc, and lower bounds for ^{xi,X2) = 
dc{xi,*i) + dcix2,*j). 

If xi 7^ ^0',A:2) then it holds for all xi,X2 that dTi{xi,X2) < 2r + 1 whereas ^{xi,X2) = dc{xi,*i) + 
dc{x2, *j) > 2t + 2. In all other cases it holds at least that dc{xi, X2) < ^(xi, X2). Thus \a • dc{xi, X2)] < 
dc{xi, *i) + dc{x2, Consequently, we have conflicts if and only if {vi, Vj} G E, which concludes the proof. 

A.3 Proof of Lemma 1377] 




Figure 5: Visualization for proof of 



Lemma 3.7 Solid lines denote links, dashed 



We have to show that the paths in the traces correspond to paths 
in G^. Let T ^ T, and o"i,o"2 € T. Let tt be the sequence 
of nodes in T connecting ai and a2- This is also a path in G^: 
since a > 0, for any two symbols ai,a2 € T, it holds that 
MAP(cri) / Map((72) as a > 0. 

We now construct an example showing that the a' for 
which G^ fulfills Axiom 2 can be arbitrarily small. Con- 
sider the graph represented in Figure [5] Let Ti = 
{s,...,t),T2 = {s,*i, . . . ,nii),T3 = {mi, . . . ,*2,m2),T4 = 
{1712, *3, ■■■ ,1^3), = (ma, . . . , *4,t). We assume a = 1. 
By changing parameters k = dc{s,t) and k' = dc{rni, *i) = 
dc{mi, *2) = dc{m3, *3) = dcims, *4), we can modulate the 
links of the corresponding star graph G*. Using dxi {s, t) = k, observe that k > 2 4^ {*i, *4} G E^. Similarly, 
k > 2{k' + 1) 4^ {*i, *3} e ^* A {*2, *4} G E^ and k > 2{k' + 2) <^ {*i, *2} G ^* A {*3, *4} G E^. Taking 
k = 2k' + 4, we thus have = {{*i, *3}, {*2, *4}; {*!, *4}}- 

Thus, we here construct a situation where *i and *2 as well as *3 and *4 can be merged without breaking the 
consistency requirement, but where merging both simultaneously leads to a topology G' that is only 4 / /c-consistent, 
since dc' {s, t) = 4. This ratio can be made arbitrarily small provided we choose k' = [k — 4)/2. 

A.4 Proof of Lemma 13.111 

In the worst-case, each star in the trace represents a different node in Gi, so the maximal number of nodes in any 
topology in Qj- is the total number of non-anonymous nodes plus the total number of stars in T. This number of 



lines denote paths (of annotated length). 
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nodes is reached in the topology Gc- According to Definition |3.4[ only non-adjacent stars in can represent the 
same node in an inferrable topology. Thus, the stars in trace T must originate from at least 7(G*) different nodes. 
As a consequence | Vi| — | V2I < s — 7(G*), which can reach s — 1 for a trace set T = {Tj = {v,*i,w)\l < i < s}. 
Analogously, \Vi\/\V2\ < {n + s) / {n + j{G^)) < (2 + s)/3. 

Observe that each occurrence of a node in a trace describes at most two edges. If all anonymous nodes are 
merged into 7(G*) nodes in Gi and are separate nodes in G2 the difference in the number of edges is at most 
2(s-7(G*)). Analogously, |^;i|/|£^2| < {i^ + 2s) / {u + 2) < s. The trace set T = {Ti = {v,*i,w)\l < i < s} 
reaches this bound. 

A.5 Proof of Lemma 13.141 

An "lower bound" example follows from Figure[2] Essentially, this is also the worst case: note that the difference in 
the shortest distance between a pair of nodes u and v in Gi and G2 is only greater than if the shortest path between 
them involves at least one anonymous node. Hence the shortest distance between such a pair is two. The longest 
shortest distance between the same pair of nodes in another inferred topology visits all nodes in the network, i.e., 
its length is bounded by — 1. 

A.6 Proof of Lemma l3J6l 

Each occurrence of a node in a trace describes at most two links incident to this node. For the degree difference 
we only have to consider the links incident to at least one anonymous node, as the number of links between non- 
anonymous nodes is the same in Gi and G2. If all anonymous nodes can be merged into 7(G*) nodes in Gi and all 
anonymous nodes are separate in G2 the difference in the maximum degree is thus at most 2{s — 7(G*)), as there 
can be at most s — ^{G^ ) + 1 nodes merged into one node and the minimal maximum degree of a node in G2 is two. 
This bound is tight, as the trace set Tj = {f j, *, wi] for 1 < i < s containing s stars can be represented by a graph 
with one anonymous node of degree 2s or by a graph with s anonymous nodes of degree two each. For the ratio of 
the maximal degree we can ignore links between non-anonymous nodes as well, as these only decrease the ratio. 
The highest number of links incident at node v with one endpoint in the set of anonymous nodes is s — 7(G*) + 1 
for non-anonymous nodes and 2{s — ^{G^) + 1) for anonymous nodes, whereas the lowest number is two. 

A.7 Proof of Lemma HI 

The proof for the upper bound is analogous to the case without full exploration. To prove that this bound can be 
reached, we need to add traces to the trace set to ensure that all pairs of named nodes appear in the trace but does 
not change the degrees of anonymous nodes. To this end we add a named node u for each pair {v, w} that is not in 
the trace set yet to Go and a trace T = {v, u, w}. This does not increase the maximum degree and guarantees full 
exploration. 

A.8 Proof of Lemma 14.51 

We first prove the upper bound for the relative case. Note that the maximal distance between two anonymous 
nodes Map(*i) and Map(*2) in an inferred topology component cannot be larger than twice the distance of two 
named nodes u and v: from Definition |4.1| we know that there must be a trace in T connecting u and v, and the 
maximal distance 5 of a pair of named nodes is given by the path of the trace that includes u and v. Therefore, and 
since any trace starts and ends with a named node, any star can be at a distance at a distance 5/2 from a named 
node. Therefore, the maximal distance between Map(*i) and Map(*2) is 5/2 + 5/2 to get to the con^esponding 
closest named nodes, plus 5 for the connection between the named nodes. As according to Lemma [4!2) the distance 
between named nodes is the same in all inferred topologies, the diameter of inferred topologies can vary at most by 
a factor of two. 

We now construct an example that reaches this bound. Consider a topology consisting of a center node c and 
four rays of length k. Let ui, 1*2, 1*3, ^4 be the "end nodes" of each ray. We assume that all these nodes are named. 
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Now add two chains of anonymous nodes of length 2k + 1 between nodes ui and U2, and between nodes U3 and 
it4 to the topology. The trace set consists of the minimal trace set to obtain a fully explored topology: six traces of 
length 2k + 1 between each pair of end nodes ui,U2,U3, u^. Now we add two traces of length 2k + 1 between nodes 
ui and U2, and between nodes and U4. These traces explore the anonymous chains and have the following shape: 
Tj = (m, *i, ...,*k,cr, *k+i, . . .,*2k,U2) and Tg = (ti3,*2fc+i, • • • , *3fc, cr', *3fc+i, • • .,*ik,Ui), where a and a' 
are stars. Let Gi = Gc and G2 be the inferrable graph where a and a' are merged. The resulting diameters are 
Diam(Gi) = 4A;+2andDlAM(G2) = 2k+l. Since s = 4A;+2, the difference can thus be as large as s/2. Note that 
this construction also yields the bound of the relative difference: Diam(Gi)/Diam(G2) = (4A; + 2)/(2/c + l) = 2. 

A.9 Proof of Lemma l46l 

Given the number of stars s, we construct a trace set T with two inferrable graphs such that in one graph the number 
of triangles with anonymous nodes is s{s — l)/2 and in the other graph there are no such triangles. As a first step 
we add s traces Tj = {vi, *j, w) to the trace set T, where \ < i < s. To make this trace set fully explored we add 
traces for each pair Vi, vj to T as a second step, i.e., traces Tj j = {vi,Vj) for 1 < i < s and I < j < s. The 
resulting trace set contains s stars and none of the stars are in conflict with each other. Thus the graph Gi merging 
all stars into one anonymous node is inferrable from this trace and the number of triangles where the anonymous 
node is part of is s{s — 1)/2. Let G2 be the canonic graph of this trace set. This graph does not contain any triangles 
with anonymous nodes and hence the difference C{Gi) — C{G2) is s(s — 1) /2. 

To see that the ratio can be unbounded look at the trace set {{v, *i,w), {u, *2,'w), {u, v)}. This set is fully 
explored since all pairs of named nodes appear in a trace. The graph where the two stars are merged has one 
triangle and the canonic graph has no triangle. 
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